subtask 2
Explaining novel senses using definition generation with open language models
Fedorova, Mariia, Kutuzov, Andrey, Periti, Francesco, Scherrer, Yves
We apply definition generators based on open-weights large language models to the task of creating explanations of novel senses, taking target word usages as an input. To this end, we employ the datasets from the AXOLOTL'24 shared task on explainable semantic change modeling, which features Finnish, Russian and German languages. We fine-tune and provide publicly the open-source models performing higher than the best submissions of the aforementioned shared task, which employed closed proprietary LLMs. In addition, we find that encoder-decoder definition generators perform on par with their decoder-only counterparts.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (8 more...)
mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection
The large language models (LLMs) are able to generate high-quality texts in multiple languages. Such texts are often not recognizable by humans as generated, and therefore present a potential of LLMs for misuse (e.g., plagiarism, spams, disinformation spreading). An automated detection is able to assist humans to indicate the machine-generated texts; however, its robustness to out-of-distribution data is still challenging. This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification. It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance (1st rank) in both, the binary detection as well as the multiclass classification of various cases of human-AI collaboration.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training
Thelen, Christian Rene, Blaneck, Patrick Gustav, Bornheim, Tobias, Grieger, Niklas, Bialonski, Stephan
Positive, supportive online communication in social media (candy speech) has the potential to foster civility, yet automated detection of such language remains underexplored, limiting systematic analysis of its impact. We investigate how candy speech can be reliably detected in a 46k-comment German YouTube corpus by monolingual and multilingual language models, including GBERT, Qwen3 Embedding, and XLM-RoBERTa. We find that a multilingual XLM-RoBERTa-Large model trained to detect candy speech at the span level outperforms other approaches, ranking first in both binary positive F1: 0.8906) and categorized span-based detection (strict F1: 0.6307) subtasks at the GermEval 2025 Shared Task on Candy Speech Detection. We speculate that span-based training, multilingual capabilities, and emoji-aware tokenizers improved detection performance. Our results demonstrate the effectiveness of multilingual models in identifying positive, supportive language.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Communications > Social Media (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)
AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering
Alhuzali, Hassan, Al-Eisawi, Walid, Abdul-Mageed, Muhammad, Abouzahir, Chaimae, Abu-Daoud, Mouath, Alasmari, Ashwag, Al-Monef, Renad, Alqahtani, Ali, Ayash, Lama, Kharouf, Leen, Shamout, Farah E., Habash, Nizar
We introduce AraHealthQA 2025, the Comprehensive Arabic Health Question Answering Shared Task, held in conjunction with ArabicNLP 2025 (co-located with EMNLP 2025). This shared task addresses the paucity of high-quality Arabic medical QA resources by offering two complementary tracks: MentalQA, focusing on Arabic mental health Q&A (e.g., anxiety, depression, stigma reduction), and MedArabiQ, covering broader medical domains such as internal medicine, pediatrics, and clinical decision making. Each track comprises multiple subtasks, evaluation datasets, and standardized metrics, facilitating fair benchmarking. The task was structured to promote modeling under realistic, multilingual, and culturally nuanced healthcare contexts. We outline the dataset creation, task design and evaluation framework, participation statistics, baseline systems, and summarize the overall outcomes. We conclude with reflections on the performance trends observed and prospects for future iterations in Arabic health QA.
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Europe > United Kingdom > Wales (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
A Multi-Strategy Approach for AI-Generated Text Detection
Zain, Ali, Farooqui, Sareem, Rafi, Muhammad
This paper presents presents three distinct systems developed for the M-DAIGT shared task on detecting AI generated content in news articles and academic abstracts. The systems includes: (1) A fine-tuned RoBERTa-base classifier, (2) A classical TF-IDF + Support Vector Machine (SVM) classifier , and (3) An Innovative ensemble model named Candace, leveraging probabilistic features extracted from multiple Llama-3.2 models processed by a customTransformer encoder.The RoBERTa-based system emerged as the most performant, achieving near-perfect results on both development and test sets.
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Singapore (0.04)
- Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.90)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers
Liang, Zihan, Pan, Ziwen, Dey, Sumon Kanti, Ismail, Azra
This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance--securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT -4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Zhao, Henry Hengyuan, Gao, Difei, Shou, Mike Zheng
Current GUI agents have achieved outstanding performance in GUI element grounding. However, planning remains highly challenging, especially due to sensitivity to the initial state of the environment. Specifically, slight differences in the initial state-such as the target software not being open or the interface not being in its default state-often lead to planning errors. This issue is widespread in real user scenarios, but existing benchmarks fail to evaluate it. In this paper, we present WorldGUI, a novel GUI benchmark that designs GUI tasks with various initial states to simulate real computer-user interactions. The benchmark spans a wide range of tasks across 10 popular software applications, including PowerPoint, VSCode, and Adobe Acrobat. In addition, to address the challenges of dynamic GUI automation tasks, we propose GUI-Thinker, a holistic framework, leveraging a critique mechanism, that effectively manages the unpredictability and complexity of GUI interactions. Experimental results demonstrate that GUI-Thinker significantly outperforms Claude-3.5 (Computer Use) by 14.9% in success rate on WorldGUI tasks. This improvement underscores the effectiveness of our critical-thinking-based framework in enhancing GUI automation.
Online Social Support Detection in Spanish Social Media Texts
Tash, Moein Shahiki, Ramos, Luis, Ahani, Zahra, Monroy, Raul, kolesnikova, Olga, Calvo, Hiram, Sidorov, Grigori
The advent of social media has transformed communication, enabling individuals to share their experiences, seek support, and participate in diverse discussions. While extensive research has focused on identifying harmful content like hate speech, the recognition and promotion of positive and supportive interactions remain largely unexplored. This study proposes an innovative approach to detecting online social support in Spanish-language social media texts. We introduce the first annotated dataset specifically created for this task, comprising 3,189 YouTube comments classified as supportive or non-supportive. To address data imbalance, we employed GPT-4o to generate paraphrased comments and create a balanced dataset. We then evaluated social support classification using traditional machine learning models, deep learning architectures, and transformer-based models, including GPT-4o, but only on the unbalanced dataset. Subsequently, we utilized a transformer model to compare the performance between the balanced and unbalanced datasets. Our findings indicate that the balanced dataset yielded improved results for Task 2 (Individual and Group) and Task 3 (Nation, Other, LGBTQ, Black Community, Women, Religion), whereas GPT-4o performed best for Task 1 (Social Support and Non-Support). This study highlights the significance of fostering a supportive online environment and lays the groundwork for future research in automated social support detection.
- South America (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Central America (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
- Law > Civil Rights & Constitutional Law (0.46)
JuniperLiu at CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements
We present the results of our system for the CoMeDi Shared Task, which predicts majority votes (Subtask 1) and annotator disagreements (Subtask 2). Our approach combines model ensemble strategies with MLP-based and threshold-based methods trained on pretrained language models. Treating individual models as virtual annotators, we simulate the annotation process by designing aggregation measures that incorporate continuous relatedness scores and discrete classification labels to capture both majority and disagreement. Additionally, we employ anisotropy removal techniques to enhance performance. Experimental results demonstrate the effectiveness of our methods, particularly for Subtask 2. Notably, we find that standard deviation on continuous relatedness scores among different model manipulations correlates with human disagreement annotations compared to metrics on aggregated discrete labels. The code will be published at https://github.com/RyanLiut/CoMeDi_Solution.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (13 more...)
A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction
Lin, Nankai, Zeng, Meiyu, Huang, Wentao, Jiang, Shengyi, Xiao, Lixian, Yang, Aimin
Currently, the majority of research in grammatical error correction (GEC) is concentrated on universal languages, such as English and Chinese. Many low-resource languages lack accessible evaluation corpora. How to efficiently construct high-quality evaluation corpora for GEC in low-resource languages has become a significant challenge. To fill these gaps, in this paper, we present a framework for constructing GEC corpora. Specifically, we focus on Indonesian as our research language and construct an evaluation corpus for Indonesian GEC using the proposed framework, addressing the limitations of existing evaluation corpora in Indonesian. Furthermore, we investigate the feasibility of utilizing existing large language models (LLMs), such as GPT-3.5-Turbo and GPT-4, to streamline corpus annotation efforts in GEC tasks. The results demonstrate significant potential for enhancing the performance of LLMs in low-resource language settings. Our code and corpus can be obtained from https://github.com/GKLMIP/GEC-Construction-Framework.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Guangdong Province > Guangzhou (0.05)
- North America > United States > Maryland > Baltimore (0.04)
- (6 more...)